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Abstract: We present optimality results for robust Kalman filtering where 
robustness is understood in a distributional sense, i.e.; we enlarge the dis- 
tribution assumptions made in the ideal model by suitable neighborhoods. 
This allows for outliers whieh in our context may be system-endogenous or 
-exogenous, which induces the somewhat conflicting goals of tracking and 
attenuation. 

The corresponding minimax MSE-problems are solved for both types of 
outliers separately, resulting in closed-form saddle-points which consist of 
an optimally-robust procedure and a corresponding least favorable outlier 
situation. The results are valid in a surprisingly general setup of state space 
models, which is not limited to a Euclidean or time-discrete framework. 

The solution however involves computation of conditional means in the 
ideal model, which may pose computational problems. In the particular 
situation that the ideal conditional mean is linear in the observation in- 
novation, we come up with a straight-forward Huberization, the rLS filter, 
which is very easy to compute. For this linearity we obtain an again sur- 
prising characterization. 
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1. Introduction 

Robustness issues in Kalman filtering have long been a research topic, with first 
(non-verified) hits on a quick search for "robust Kalman filter" on scholar. 
google.com as early as 1962 and 1967, i.e.; the former even before the seminal 
Huber (1964) paper, often referred to as birthday of Robust Statistics. 

In the meantime there is an ever growing amount of literature on this topic 
— Kassam and Poor (1985) have already compiled as many as 209 references 
to that subject in 1985. Excellent surveys are given in, e.g. Kassam and Poor 
(1985), Stockinger and Duttcr (1987), Schick and Mitter (1994), Kiinsch (2001). 

In these references you find many different notions of robustness, all some- 
what related to stability but measuring this stability w.r.t. deviations of var- 
ious "input parameters" ; in this paper we are concerned with (distributional) 
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minimax robustness^ i.e.; we work with suitable distributional neighborhoods 
about an ideal model, already used by Birmiwal and Shen (1993) and Birmiwal 
and Papantoni-Kazakos (1994), and then solve the problem to find the proce- 
dure minimizing the maximal predictive inaccuracy on these neighborhoods — 
measured in terms of mean squared error (MSE) — in quite generality, com- 
pare Theorems 3.2, 3.10, 4.1. In the particular situation that the ideal condi- 
tional mean is linear in the observation innovation (for a definition see subsec- 
tion 2.3.2), the minimax filter is a straight-forward Huberization, the rLS filter, 
which is extremely easy to compute. For this linearity we obtain a surprising 
characterization in Propositions 3.4 and 3.6. This motivates a corresponding 
optimal test for linearity. Proposition 3.8. Even in situations where no or only 
partial knowledge of the size of the contamination is available we can distinguish 
an optimal procedure, compare Lemma 3.1. 

2. General setup 
2.1. Ideal model 

In this section, we start with some definitions and assumptions. We are working 
in the context of state space models (SSM's) as to be found in many textbooks, 
cf. e.g. Anderson and Moore (1979), Harvey (1991), and Durbin and Koopman 
(2001). 

2.1.1. Time Discrete, linear Euclidean Setup 

The most prominent setting in this context is the linear, time-discrete, Eu- 
clidean setup, which will serve as reference setting in this paper: An unobserv- 
able p-dimensional state Xt evolves according to a possibly time-inhomogeneous 
vector autoregressive model of order 1 (VAR(l)) with innovations Vt and tran- 
sition matrices Ft, i.e., 

Xt = FtXt-i + Vt (2.1) 

The statistician observes a q-dimensional linear transformation Yt of Xt and in 
this makes an additive observation error ej, 

Yt = ZtXt + et (2.2) 

In the ideal model we work in a Gaussian context, that is we assume 

vt^Mp{Q,Qt), et^Mq{Q,Vt), Xo^Mpiao,Qo), (2.3) 
Xo,Vs,et, s,tEN stochastically independent (2.4) 

As usual, normality assumptions may be relaxed to working only with specified 
first and second moments, if we restrict ourselves to linear unbiased procedures 
as in the Gauss-Markov setting. 

For this paper, we assume the hyper-parameters Ft, Zt,Qt,Vt, ao to be known. 
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2.1.2. Generalizations covered by the present approach 

Parts of our results (raore specifically, all of sections 3.2, 3.4) also cover much 
more general SSMs; in this paragraph we sketch some of these. To begin with, as 
long as MSE makes sense for the range of the states, these results cover general 
Hidden Markov Models for arbitrary observation space as given by 



In this setting, we assume known (and existing) [regular conditional] densities 
Pq° , p^ , w.r.t. known measures ft, /it on M'' and B^, respectively. Dy- 
namic (generalized) linear models as discussed in West ct al. (1985) and West 
and Harrison (1989) are covered as well — under corresponding assumptions as 
to (conditional) densities and range of the states. In applications of Mathemat- 
ical Finance we also need to cover continuous time settings, i.e.; there is an 
unobservable state evolving according to an SDE 



where for Xq we assume (2.5), while Wt, is a Wiener process, and / and q are 
suitably measurable, known functions, and observations Yt are either formulated 
as a time-continuous observation process (as in Tang (1998)) or — more often — 
at discrete, but not necessarily equally spaced times, compare, e.g. Nielsen et 
al. (2000) and Singer (2002). In this context, but also for corresponding non- 
linear time-discrete SSMs, a straightforward approach linearizes the correspond- 
ing transition and observation functions to give the (continuous-discrete) Ex- 
tended Kalman Filter (EKF) After this linearization we are again in the context 
of a (time-inhomogeneous) linear SSM, hence the methodology we develop in 
the sequel applies to this setting as well. 

So far we do not cover approaches to improve on this simple linearization, 
notably the second order nonlinear filter (SNF) introduced in Jazwinski (1970), 
also cf. Singer (2002, sec. 4.3.1). the unscented Kalman filter (UKF) (Julier 
et al., 2000) and Hermite expansions as in Ai't-Sahalia (2002), see also Singer 
(2002, sec. 4.3). 

Going one more step ahead, to cover applications such as portfolio optimiza- 
tion, we may allow for controls Ut to be set or determined by the statistician, 
and which are fed back in the state equations. In the context of the continuous 
time model, this is also known as SDEX, cf. Nielsen et al. (2000), and for the ap- 
plication of stochastic control to portfolio optimization, cf. Korn (1997). In this 
setting, controls Ut are usually assumed measurable w.r.t. a{Yt_); to integrate 
them into our setting, we simply have to integrate them in the corresponding 
condition vectors. 




(2.5) 



(2.6) 



(2.7) 



dXt = f{t,Xt)dt + q{t,Xt)dWt 



(2.8) 
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Finally, the question of specifying the order of conditioning left aside, we do 
not make use of the linearity of time, so our minimax results also cover suitable 
formulations of indirectly observed random fields. 

2.2. Deviations from the ideal model 

As usual with Robust Statistics, the ideal model assumptions we have specified 
so far are extended by allowing (small) deviations, most prominently generated 
by outliers. In our notation, suffix "id" indicates the icteal setting, "di" the 
rfistorting (contaminating) situation, "re" the realistic, contaminated situation. 

2.2.1. AO's and lO's 

In SSM context (and contrary to the independent setting), outliers may or 
may not propagate. Following the terminology of Fox (1972), we distinguish 
innovation outliers (or lO's) and additive outliers (or AO's). Historically, AO's 
denote gross errors affecting the observation errors, i.e., 

AO :: e'i ^{l~r^o)C{ef)+r^oC{ef) (2.9) 

where C{£f) is arbitrary, unknown and uncontrollable (a.u.u.) and < < 1 
is the AO-contamination radius, i.e.; the probability for an AO. lO's on the 
other hand are usually defined as outliers which affect the innovations, 

10 :: ^{l-r,o)C{vf)+r,oC{vf) (2.10) 

where again C{v'f") is a.u.u. and < rio < 1 is the corresponding radius. 

We stick to this distinction for consistency with literature, although we rather 
use these terms in a wider sense, unless explicitly otherwise stated: /O's denote 
endogenous outliers affecting the state equation in general, hence distortion 
propagates into subsequent states. This also covers level shifts or linear trends; 
which if \Ft\ < 1 are not included in (2.10), as lO's would then decay geomet- 
rically in t. We also extend the meaning of AO's to denote general exogenous 
outliers which enter the observation equation only and thus do not propagate, 
like substitutive outliers or SO's defined as 

SO :: Yr^{l-rso)C{Yn + rsoC{Yr) (2.11) 

where again C^Y^"^') is a.u.u. and < rso < 1 is the corresponding radius. 

Apparently, the SO-ball of radius r consisting of all f.{Y") according to (2.11) 
contains the corresponding AO-ball of the same radius when Y^" = ZtXt + 
However, for technical reasons, we make the additional assumption that 

y("^,F('^' stochastically independent (2.12) 

and then this relation no longer holds. 
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2.2.2. Different and competing goals induced by endogenous and exogenous 
outliers 

In the presence of AO's we would like to attenuate their effect, while when there 
are lO's, the usual goal in online applications would be tracking, i.e.; detect 
structural changes as fast as possible and/or react on the changed situation. A 
situation where both AO's and lO's may occur poses an identification problem: 
Immediately after a suspicious observation we cannot tell 10 type from AO 
type. Hence a simultaneous treatment of both types will only be possible with 
a certain delay — see Ruckdeschel (2010). 



2.3. Classical Method: Kalman Filter 



2.3.1. Filter Problem 



The most important problem in SSM formulation is to reconstruct the unob- 
servable states Xt based on the observations Yt. For abbreviation let us denote 

Yi.,t^{Yu...,Yt), Yi:o:-0 (2.13) 
Then using MSE risk, the optimal reconstruction is distinguished as 

I 1 2 

E \Xt — /f = rniny^, ft measurable w.r.t. a{Yi.s) (2-14) 

Depending on s this is a prediction (s < i), a filtering (s — t) and a smoothing 
problem [s > t). In the sequel we will confine ourselves to the filtering problem. 



2.3.2. Kalman-Filter 

It is well-known that the general solution to (2.14) is the corresponding condi- 
tional expectation E[Xt|Yi:s]. Except for the Gaussian case, this exact condi- 
tional expectation may be computational too expensive. Hence similar to the 
Gauss-Markov setting, it is common to restrict oneself to linear filters. In this 
context, the seminal work of Kalman (1960) (discrete-time setting) and Kalman 
and Bucy (1961) (continuous-time setting) introduced effective schemes to com- 
pute this optimal linear filter Xt\t. In discrete time, we reproduce it here for 
later reference: 

Init.: Xo|o = ao, ^f)\o Qo (2-15) 

Pred.: Xt\t-i ^ FtXt^^t-u ^t\t-i ^ FtY.t-i\t-iF^ + Qt (2.16) 

Corr.: X^^t = X^^t^^ + l^Y^ ^t\t ^ {Ip - Zt)llt\t~^ (2.17) 
for /^Xt ^ Xt - Xt\t_^, /^Yt^Yt~ZtXt\t-i^Zt^Xt+et, 
At = ZtJ:t\t-iZl + Vt, = ^t\t-iZlAt (2.18) 

and where AXt is the prediction error, AYj the observation innovation, and 
llt\t = Cov(AXO, Et|t_i = Goy{Xt - Xt\t-i), A* = Cov(AyO; is the so- 
called Kalman gain, and A^ stands for the Moore-Penrose inverse of At . 
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2.3.3. Optimality of the Kalman- Filter 

Realizing that M^AYt is an orthogonal projection, it is not hard to see that the 
(classical) Kalman filter solves problem (2.14) (for s = t) among all linear filters. 
Using orthogonality of {AYtlt once again, we may setup similar recursions for 
the corresponding best linear smoother; see, e.g. Anderson and Moore (1979), 
Durbin and Koopman (2001). Under normality, i.e.; assuming (2.3), we even 
have Xt|t[_i] = E[X(|yi.t[_i]], i.e.; the Kalman filter is optimal among all yi:f[-i]- 
measurable filters. It also is the posterior mode of £{Xt\Yi.t) and Xt\t can also 
be seen to be the ML estimator for a regression model with random parameter; 
for the last property, compare Duncan and Horn (1972). 

2.3.4- Features of the Kalman- Filter 

The Kalman filter stands out for its clear and understandable structure: it comes 
in three steps, all of which are linear, hence cheap to evaluate and easy to 
interpret. Due to the Markovian structure of the state equation, all information 
from the past useful for the future may be captured in the value of X^f_i, so 
only very limited memory is needed. 

From a (distributional) Robustness point of view, this linearity at the same time 
is a weakness of this filter — y enters unbounded into the correction step which 
hence is prone to outliers. A good robustification of this approach would try to 
retain as much as possible from these positive properties of the Kalman filter 
while revising the unboundedness in the correction step. 

3. The rLS as optimally-robust filter 
3. 1 . Definition 

3.1.1. robustifying recursive Least Squares: rLS 

In a first step we limit ourselves to AO's. Notationally, where clear from the 
context, we suppress the time index t. As no (new) observations enter the ini- 
tialization and prediction steps, these steps may be left unchanged. In the cor- 
rection step, we will have to modify the orthogonal projection present in (2.17). 
Suggested by H. Rieder and worked out in Ruckdeschel (2001, ch. 2), the follow- 
ing robustification of the correction step is straightforward: Instead of M'^AY, 
we use a Huberization of this correction 

HbiM°AY) = M" AY mm{l, b/\M° AY\} (3.1) 

for some suitably chosen clipping height b. Apparently, this proposal removes 
the unboundedness problem of the classical Kalman filter while still remaining 
reasonably simple, in particular this modification is non-iterative, hence espe- 
cially useful for online-purposes. 
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3.1.2. Choice oj the clipping height b 

For the choice of the chpping height b, we have two proposals. Both are based 
on the simphfying assumption that Eij[AX|Ay] is hnear, which wiU turn out 
to only be approximately right. The first one, an Anscombe criterion, chooses 
b — b{d) such that 

E,^\AX - Hb{M"AY)\^ = (1 + ,5)E., |AX-Af"Ar|^ (3.2) 

6 may be interpreted as "insurance premium" to be paid in terms of loss of 
efficiency in the ideal model compared to the optimal procedure in this (ideal) 
setting, i.e.; the classical Kalman filter. 

The second criterion for a given radius r e [0, f] of the (SO-) neighborhood 
W°{r) determines b = b{r) such that 

(l-r)E.,(|M°Ar|-6)+ ^rfe (3.3) 

Assuming linear ideal conditional expectations, this will produce the minimax- 
MSE procedure for W°{r) according to Theorem 3.2 below. 

One might object that (3.3) assumes r to be known, which in practice hardly 
ever is true. If r is unknown however, we translate an idea worked out in Ricder 
et al. (2008): Assume we have limited knowledge about r, say r g [r/,r„], < 
fi < fu 1^ ^- Then we distinguish a least favorable radius vq defined in the 
following expressions 

ro ^ argmin^gr ,po(s), Po(s) = max p(r,s), (3.4) 

i-e[ri,r„] 

_ maxpg^so(^) MSEp(rLS(6(s))) 
' maxp/g;^so(r) MSEp/ (rLS(6(r))) 

and use the corresponding b{ro). Procedure rLS(fe(ro)) then minimizes the max- 
imal inefhciency po{s) among all procedures rLS(6(r)), i.e.; each rLS for some 
clipping height b{r) ^ b^ro) has an inefficiency no smaller than po{ro) for some 
r' € [ri , r„] . Radius rg can be computed quite effectively by a bisection method: 
Let 

Ar = EiJtrCoViJAX|Ar"'] + (|M"Ar'"|-6(r))^j (3.6) 
Br = Ei, [|M°Ay'^|2 - (|M"Ay^| - bir))l\ + b{rf (3.7) 

Then the following analogue to Kohl (2005, Lemma 2.2.3) holds: 

Lemma 3.1. In equations (3.4) and (3.5), let r,s vary in [ri,r„] with < r; < 
< 1. Then 

po(r) = max{Aj.Mn,Sr/Sr„} (3.8) 

and there exists some fo € [r;, r„] such that Af^/A^ = Bf^/Br^ . This tq least 
favorable, i.e., min^gf^j^^^] poir) = Po{fo). Moreover, if r^ = I, = r„. 

In particular, the last equality shows that one should restrict r„ to be strictly 
smaller than 1 to get a sensible procedure. 
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3.2. (One-Step)-Optimality of the rLS 

The (so-far) ad-hoc robustification proposed in the rLS fiher has some remark- 
able optimahty properties: Let us first forget about the time structure and 
instead consider the following simplified, but general "Bayesian" model: 

We have an unobservable but interesting signal X ^ P^{dx), where for tech- 
nical reasons we assume that in the ideal model E |Xp < oo. Instead of X we 
rather observe a random variable Y taking values in an arbitrary space of which 
we know the ideal transition probabilities; more specifically, we assume that 
these ideal transition probabilities for almost all x have densities w.r.t. some 
measure /i, 

P^\''=^idy)^7:{y,x)ti{dy) (3.9) 

Our approach uses MSE as accuracy criterion for the reconstruction, so is limited 
to ranges of X where this makes sense. On the other hand it is this reduction to 
the "Bayesian" model which makes the generalizations sketched in section 2.1 
possible. As (wide-sense) AO model, we consider an SO outlier model, i.e.; 

= (1 - U)V'' + UY"', U - Bin(l, r) (3.10) 

for U independent of {X, F''') and {X, Y"^') and some distorting random variable 
Y"^" for which, in a slight variation of condition (2.12) we assume 

Y"^', X stochastically independent (3.11) 

and the law of which is arbitrary, unknown and uncontrollable. As a first step 
consider the set dU^°{r) defined as 

dW{r) = |/:(X, Y"'^) I Y"" acc. to (3.10) and (3.11)| (3.12) 

Because of condition (3.11), in the sequel we refer to the random variables Y'" 
and Y"^" instead of their respective (marginal) distributions only, while in the 
common gross error model as present in (2.9) or (2.10), reference to the respec- 
tive distributions would suffice. Condition (3.11) also entails that in general, 
contrary to the usual setting, C{X,Y'"^) is not element of dW°{r), i.e.; not 
representable itself as some C{X^ Y'°) in this neighborhood. As corresponding 
(convex) neighborhood we define 

W°{r)^ IJ dW\s) (3.13) 

0<s<r 

Of course, W°{r) contains C{X^ Y"^). In the sequel where clear from the context 
we drop the superscript SO and the argument r. 

With this setting we may formulate two typical robust optimization problems: 
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Minim£Lx-SO problem Minimize the maximal MSE on an SO-neighborhood, 
i.e.; find a measurable reconstruction /o for X s.t. 

max^^ E,jX-/(y")p = min/! (3.14) 

Lemma5-SO problem As an analogue to Hampel (1968, Lemma 5), mini- 
mize the MSE in the ideal model but subject to bound on the bias to be fulfilled 
on the whole neighborhood, i.e.; find a measurable reconstruction /o for X s.t. 

E.,|X-/(r^)|2 =min/! s.t. sup^^ | E„ /(F") - E X| < 6 (3.15) 

The solution to both problems can be summarized as 

Theorem 3.2 (Minimax-SO, Lemma5-S0). (1) In this situation, there is a 
saddle-point (/o,P,]"'') for Problem (3.14) 

/o(y) := ^X + D{y)wr{D{y)), 7«,(z) = min{l, p/|z|} (3.16) 
pridv) '-i^{\D{y)\lp - 1)+ P^'\dy) (3.17) 

where p > ensures that J P^ \dy) = 1 ana 

D{y)^E,,[X\Y = y]-EX (3.18) 
The value of the minimax risk of Problem (3.14) is 

trCov(X) - {\ - r)Y.,^[\D{Y"')\'^Wr{Y"')\ (3.19) 

(2) /o from (3.16) also is the solution to Problem (3.15) for b — p/r. 

(3) //Ei<j[X|y] is linear in Y, i.e.; K^^[X\Y] = MY for some matrix M, then 
necessarily 

M = M" = Cov(X, Y) Var Y' (3.20) 

or in SSM formulation: is just the classical Kalman gain and /o the 
(one-step) rLS. 



3.2.1. Identifications for the SSM context 

Identifying X in model (3.9) with AXj and TT{y,x) p{dy) with C{ZtAXt + 
et){dy), our "Bayesian" Model (3.9) covers the SSM context. Hence, if AXf 
is normal, (3) applies and rLS is SO-minimax. 



3.2.2. Example for SO-least favorable densities 

To illustrate the result of Theorem 3.2, we have plotted the ideal density of P^ , 
the (least favorable) contaminated density of P^ , and the (least favorable) 
contaminating density of Pq in Figure 1. 
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Fig 1. Densities of P^"^ , P^" , P^'^' for P^ = P^ =AfiO, 1), r = 0.1; note the "thin" tails. 



Remark 3.3. (a) Without using this name, SO neighborhoods have aheady been 
used by Birmiwal and Shen (1993) and Birmiwal and Papantoni-Kazakos (1994), al- 
though only in a one-dim. model. 

(b) Explicit solutions to robust optimization problems in a finite sample setting are 
rare, which is why one usually appeals to asymptotics instead. Important exceptions 
are Huber (1968), Huber and Strassen (1973), and even there, in the former case one is 
limited to a special loss function and to one dimension. Our results however are valid 
in a finite sample context and in whole generality. 

(c) Although the structure of our model resembles a location model — interpreting X 
as a random location parameter — our saddle-point differs from the one obtained in 
Huber (1964). To see this, let us look at the tails of the least favorable assuming 
a Gaussian model for simplicity: while in Huber's setting the tails decay as ce~*'^' for 
some c, > 0, in our setting they decay as c'|a::|e~^ so appear even "less harmful" 
than in the location case. 

(d) Attempts to solve corresponding optimization problems in a (narrow-sense) AO 
neighborhood are much more difficult and only partial results in this context have been 
obtained in Donoho (1978), Bickel (1981), and Bickel and Collins (1983); in particular 

di 

one knows, that in the setup of our example the least favorable — Pq must be 
discrete with only possible accumulation points ±oo. In addition, existence of a saddle- 
point follows from abstract compactness and continuity arguments, but in order to 
obtain specific solutions one has to recur to numeric approximation techniques as e.g. 
worked out in Ruckdeschel (2001, sec. 8.3); in particular, one obtains redescending 
optimal filters. 

(e) Redescenders are also used in the ACM filter by Masreliez and Martin (1977) 
which formally translates the Huber (1964) minimax variance result to this dynamic 
setting (formally, because of the randomness of the "location parameter" AX). It 
should be noted though that the least-favorable SO-situation for the ACM then is not 



Peter Ruckdeschel/ Optimally Robust Kalman Filtering 



11 



in the tails but rather where the corresponding ip function takes its maximum in ab- 
solute value. An SO outlier could easily place contaminating mass on this maximum, 
while this is much harder if not impossible to achieve in a (narrow-sense) AO situa- 
tion. Hence in simulations where we produce "large" outliers, the ACM filter tends to 
outperform the rLS filter, as these "large" outliers are least favorable for the rLS but 
not for the ACM. The "inliers" producing the least favorable situation for the ACM 
on the other hand will be much harder to detect on naive data inspection than "large" 
outliers, in particular in higher dimensions. 

3.3. Back in the AX Model for t > 1 

So far, in this section, we have ignored the fact that our X in model (3.9) resp. 
AXt in the SSM context will stem from a past which has already used our 
robustified version of the Kalman filter. In particular, the law of AXt (even in 
the ideal model) is not straightforward and hence (ideal) conditional expectation 
appearing in the optimal solution /q in Theorem 3.2 in practice are not so easily 
computable. 

3.3.1. Approaches to go back 

The issue to assess the law AXt from a non-linear filter past is common to other 
robustifications, and hence there already exist a couple of approaches to deal 
with it: Masrelicz and Martin (1977) and Martin (1979) assume C{AXt) normal 
and propose using robust location estimators (with redescending '0-function) as 
alternatives to the linear correction step. Contradicting this assumption in the 
rLS case, we have the following proposition 

Proposition 3.4. Whenever in one correction step in the AXt past one has 
used the rLS-filter, then {AXt} (as a process) cannot be normally distributed; 
this assertion cannot even hold asymptotically, as long as 

< liminf &f < limsupbt < oo (3-21) 

t t 

Similar assertions can also be proven for particular i/j-functions used in the 
ACM filter of Masrehcz and Martin (1977) and Martin (1979). 

Schick (1989) and Schick and Mitter (1994) use Taylor-expansions for non- 
normal C{AXt); doing so they end up with stochastic error terms but do not 
give an indication as to uniform integrability. Hence it is not clear whether 
the approximation stays valid after integration. More importantly, at time in- 
stance t, they come up with a bank of (at least t) Kalman-filters which is not 
operational. 

Birmiwal and Shen (1993) work with the exact £{AXt) and hence have to 
split up the integration according to the the history of outlier occurrences which 
yields 2* different terms — which is not operational either. 

Remark 3.5. One of the features of the ideal Gaussian model is that Eid[AXt|yi:t] 
is Markovian in the sense that Eid[AXt|yi:t] = Eid[AXt|Ayt] hence only depends on 
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the one value of Alt. When using bounded correction steps, however, this property 
gets lost, hence the restriction to strictly recursive procedures as is the rLS filter is a 
real restriction. 

Theorem 3.2 does not make any normality assumptions, but in assertion (3), 
we have seen that the rLS would result optimal once we can show that [AXj | AYt] 
for AX stemming from an rLS past is linear. This leads to the question: When 
is Eid[AX|Al^] linear? Omitting time indices i, the answer is 

Proposition 3.6. Assume rk(Ip — MZ) = p, p — q and vkZ — p, and that 

C,^{e) = Afq{0,V), e independent of AX (3.22) 

ThenE,^[AX\AY] is linear 

<=^ Cid{AX) is normal (3.23) 
<^ A/3(e) :=:Eid [(e^(AX-E[AX|Ay]))^| Ay = y] =0 Ve e (3.24) 

Remark 3.7. (a) Assumption rk(Ip — MZ) = p is fulfilled in most situations; oth- 
erwise there is a one-dimensional projection of the filter error that is almost sure. 

(b) For Z non-invertible, in particular for p ^ q, equivalence (3.23) still holds, if we 
require 

£id(nAX) = Afp{0, HEn), UAX independent of RAX (3.25) 

where 11 is the projector onto ker Z and fl = Ip — 11. In fact we prove Proposition 3.6 
in this more general case. Assumption (3.25) is needed, as IIAX is invisible for AY. 

(c) Equivalence (3.23) together with Proposition 3.4 shows that, stemming from an 
rLS-past, rLS can only be SO-optimal in the very first time step. 

(d) Simulations however show that rLS gives very reasonable results. So in fact we 
could/should be close to an ideal linear conditional expectation. "Closeness" to lin- 
earity could be quantified by the second derivative /dy^ Eid[AX| Ay = y], which in 
fact leads us to expression (3.24). 

(e) Equivalence (3.24), i.e.; conditional unskewedness of AX, is somewhat surprising, 
as it seems much weaker than normality of the prediction error. 

(f) Condition (3.22) could be relaxed to e ~ P, P some infinitely divisible distri- 
bution, and the normality assumption in (3.25) be dropped. Equivalence (3.23) would 
then become: For each M £ 'W^'^ there can be at most one distribution Q — Q{M,P) 
on , such that E[AX|Ay] = MAY for C{nAX) = Q; foi p = q = 1 and Z ^ 0, 
there always is such a Q; see Ruckdeschel (2001, Thm. 1.3.1). 

3.3.2. A test for linearity 

In particle filter context where you simulate many stochastically independent 
filter realizations in parallel. Proposition 3.6 suggests the following test for lin- 
earity / normality: 

Proposition 3.8. Let AX'^ , i — l,...,n be an i.i.d. sample from C{AXt), 
the law of the prediction errors of some filter at time t; let S = Cov(AXt), 
tr^ its maximal eigenvalue and e a corresponding eigenvector (of norm 1); let 
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£„, (7^, and e„ the corresponding empirical counter parts (all assumed consis- 
tent). Define the test statistic Tn — - X]"=i('^ri^''^i )'^- Then under normality of 

Vr^;r„ ^ A/'(0,15ct6) (3.26) 

and the test 

I(|r„| > yi5AiCT3u„/2) (3.27) 

for Ua the upper a-quantile of Af{0, 1) is asymptotically most powerful among 
all unbiased level-a-tests for testing 

Hq: supAf3(e)==0 vs. Hi: sup |M3(e)| > (3.28) 

|e| = l |e| = l 

3.4- Way out: eSO-Neighborhoods 

One explanation for the good empirical findings for the rLS is given by a further 
extension of the original SO-neighborhoods — the extended SO or eSO -model: 
In this model, we also allow for model deviations in X, i.e.; we assume a realistic 
{X'^jY'") according to 

(X",r"^) := (1 - c/)(x■^y"^) + uix^^Y"') (3.29) 

for X'" ~ P^'\ according to equation (3.9), X"' ~ P^"', Y"' ~ P^"', 
U ~ Bin(l,r„so), where 

U and (X'^r'") independent as weU as (mutually) U,X''',Y''' (3.30) 

and the joint law and the radius r = r^so are known, while P^ , P^ 

are arbitrary, unknown and uncontrollable; however, we assume that 

X"' = E,^ E^i\X'"\^<G (3.31) 

for some known < Ej^ < G < oo, and accordingly define 

Z^"SO(r) IJ dW^'^is), dW^'^'ir) { £(X", F") acc. to (3.29)-(3.31) } 

0<s<r 

(3.32) 

Remark 3.9. At first glance, moment condition (3.31) seems to violate (distribu- 
tional) robustness; however, this condition has not been introduced to induce a higher 
degree of robustness, but rather to extend the applicability of Theorem 3.2. 

Theorem 3.10 (minimax-eSO). The pair {fo:P^ ), optimal in the Minimax- 
SO-problem to radius rso — r from Theorem 3.2, extended to (/o, Pq ® P^ ) 
for any P^ such that E^j jX'^'p — G, remains a saddle-point in the correspond- 
ing Minimax- Problem on the eSO -neighborhood U°^'^ to the same radius r — no 
matter what bound G in equation (3.31) holds. The value of the minimax risk is 

tr Gov,, X'^ + r(G - E,, \X'^\^) - (1 - r) E,, [ \D{Y^^)\^Wr{Y^^)] (3.33) 
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As an application of Theorem 3.10, we now invoke a coupling idea: In the 
Gaussian setup, i.e.; we assume (2.3), we no longer regard the (SO-) saddle-point 
solution to an Z//(r)-neighborhood around £(AX) stemming from an rLS-past, 
but use Theorem 3.10 as follows: 

Proposition 3.11. Assume that for each time t there is a (Active) random 
variable AX-^ ~ A/'p(0, S) such that AXl^^ stemming from an rLS-past can be 
considered an X'*' in the corresponding eSO -neighborhood around AX-^ with 
radius r. Then, rLS is exactly minimax for each time t. 

Remark 3.12. (a) Existence of AX^ ~ A/'p(0,E) in a general setting is not yet 
proved. To this end one has to show moment condition (3.31) and that 



where p ' , p ' are the corresponding Lebesgue densities and sup_;^ is the corre- 
sponding essential supremum w.r.t. Lebesgue measure in the respective dimension. 
Clearly condition (3.34) is the difficulty, while condition (3.31) is not hard to fulfill — 
we only need to check that Eid AXt = 0, which for the rLS follows from symmetry of 
the distributions in the ideal model, and that the second moment is bounded — which 
also clearly holds. 

(b) As to the choice of covariance E for AX^, we have two candidates: E — Gov AXl^^ 
and E = Et|t_i from the classical Kalman filter. While the former takes up the actual 
error covariances, the latter is much easier to compute. In our numerical examples 
in Ruckdeschel (2001), we could not find any significant advantages for the former in 
terms of precision and hence propose the latter for computational reasons. 

(c) For p = 1, (3.34) could be checked numerically in a number of models, cf. Ruck- 
deschel (2001, Table 8.1). For p > 1, particle filter techniques should be helpful. 

4. lO-optimality 

In this section, we translate the preceding optimality results to the 10 situation. 
We have already noted that in this case, instead of attenuating (the influence 
of) a dubious observation wc would rather want to follow an 10 outlier as fast 
as possible. It is well-known that the Kalman filter tends to be too inert for 
this task and faster tracking filters are needed. To do so, let us go back to our 
"Bayesian" model (3.9) but now we specify the transition densities TT{y,x) to 
come from an observation Y which is built up additively as 



Equation (4.1) reveals a remarkable symmetry of X and e which we are going 
to exploit now: Apparently 




(3.34) 



Y = X + e 



(4.1) 



E[X\Y] =Y -E[e\Y] 



(4.2) 



This is helpful if we are now assuming that e will be ideally distributed, and 
instead the states X get corrupted. To this end, we retain the SO-model from the 
preceding sections, i.e., Y'"^ will be replaced from time to time by Y"^'. Contrary 
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to the AO formulation however, we now assume that this replacement by l"**' 
reflects a corresponding change in X, as we now want to track the distorted 
signal. As a consequence this gives the following lO-version of the minimax 
problem (where the only visible difference is the superscript "re" for X). 

maxii E,, \X" - = min/ ! (4.3) 

But, using X'" = Y" — e, and setting f{y) = y — f{y) we obtain the equivalent 
formulation 

maxu E„ \e - f{Y''')\^ = minj ! (4.4) 

and we are back in the situation of subsection (3.2) with the respective roles of 
X and £ interchanged. That is; the corresponding theorems translate word by 
word. Skipping the Lemma 5 solution we obtain 

Theorem 4.1 (Minimax-IO). (1)' In this situation, there is a saddle-point 
ifi^Pi"') for Problem (4.3) 

My) := y-Diy)mm{l,p/\D{y)\} (4.5) 
Pridy) ■■= '-^{\D{y)\/p~l)^P^'\dy) (4.6) 

where p > ensures that J PY \dy) = 1 ana 

b{y)^y-^,,[X\Y = y] (4.7) 

(3)' //Ei<j[X|y] is linear in Y, i.e.; E,^[X\Y] = MY for some matrix M, then 
necessarily 

M = M° = Cov(X, Y) Var Y~ (4.8) 

— or in the SSM formulation: ill" is just the classical Kalman gain and 
fi the (one-step) rLS.IO defined below. 

Note that contrary to Theorem 3.2 where EX need not be 0, here Ee = 0, 
which simplifies the definition of D in (4.7). Details on how to use this for a 
corresponding lO-robust variant of rLS are given in Ruckdeschel (2010). 

5. Conclusion and Outlook 

In the extremely flexible class of dynamic models consisting in SSMs we were 
able to obtain optimality results for filtering. In this generality this is a novelty. 
We stress the fact that our filters are non-iterative, recursive, hence fast, and 
valid for higher dimensions. 

So far, we have not said much about the implementation of these filters. 
rLS. AO was originally implemented to XploRe, compare Ruckdeschel (2000). 
In an ongoing project with Bernhard Spangl, BOKU, Vienna, and Irina Ur- 
sachi (ITWM), we are about to implement the rLS filter to R, (R Development 
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Core Team (2010)), more specifically to an R-package robKalman, the develop- 
ment of whicli is done under r-forge project https : //r-f orge . r-project . 
org/projects/robkalman/, (R-Forge Administration and Development Team 
(2008)). Under this address you will also find a preliminary version available for 
download. 

In an extra paper, which for the moment is available as technical report, 
Ruckdeschel (2010), we also check the properties of our filters at simulations and 
discuss the extension of these optimally-robust filters to a filter that combines 
the two types (for system-endogenous and -exogenous outlier situation). This 
hybrid filter is capable to treat (wide-sense) 10 's and AO's simultaneously — 
albeit with minor delay. 

6. Proofs 

Proof to Lemma 3.1 We use the fact that for < a,b,c,d, {a + b)/{c + d) < 
max{a/c,b/d). Hence 

Pais) < ma^{As/Ar,,Bs/BrJ (6.1) 

Equation (3.3) shows that b{r) is (strictly) decreasing in r (for r > 0) from oo to 
0. Hence Ar is increasing in r, and Br decreasing, Br from oo to 0. By dominated 
convergence b(r), and hence Ar and Br are continuous in r. Thus existence of 
fo follows. For r„ = 1, one argues letting r„ e [0, 1) tend to 1. To show equality 
in (6.1), we parallel Kohl (2005, Lemma 2.2.3), and first show that for r > s, 
s fixed, p{r, s) is increasing and correspondingly, for r < s, s fixed, decreasing, 
which entails (3.8): Let < s < ri < r2 < 1. Then by monotony of Ar, Br, 
(AgBj^ + ri)~^ > [Ar^Br-^^ + ri)^^; multiplying this inequality with (r2 — ri), 
we get (ra - ri)B,(A, + nB^)-'^ > ( ''2 — Ti)Bri{Ari + ri_Bri) ^. Now, due to 
optimality of Ar + rBr for radius r, 

Q ^ (r2 -ri)Bs (ra - ri)Br^ + Ar^ + rgg,., - Ar^ - r2Br^ _ 
- As+riBs Ar,+riBr^ 

= (r-a - ri)B,{As + nB,)-^ - {Ar, + raS,,) (A,, + nBr,)-^ + 1 

Multiplying with {As+riBs)/{Ar2 +r2Br2), we obtain indeed p(r2, s) > p{ri,s), 
and similarly for > s > ri > r2 > 1. Next, for fo least favorable, we show 
that for r fixed, and s > r, p{r, s) is increasing and correspondingly, for s < r, 
decreasing: Let < r < ri < r2 < 1- Then, due to optimality of Ar-^ — riBr^, 

Ar2 ^ ^Br2 Ar^ ^" vBr^ — 

= in - r){Br, - Br,) + Ar, + nBr, - Ar, - TiS,., > 

and similarly for > ?' > ri > r2 > 1. For the last assertion, note that by (3.3), 
6(1) = 0, hence Bi = 0. Hence max {As/Ar,,Bs/Bi^ = oo for s < 1, while for 
s = 1, we get po(l) = max{Ai/Ar, , 1} = 1- D 
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Proof to Theorem 3.2 (1) Let us solve max^;^ miny [. . .] first, which amounts 
to mingiY J| E„[X|F"] 1^]. For fixed element P^"*' assume w.l.o.g. that ^ ^ 
for ^ from (3.9) — otherwise we replace /i by + ; this gives us a 
/x-density q{y) of . Determining the joint (real) law P^^^ (^dx, dy) as 

P{X e A, ^B)=j Uix) Ib (y) [(l-r)^(y, x) + rq{y)] (dx) ii{dy) (6.2) 

we deduce that ^{dy)-a.e. 

p r^l^.e^ 1 ^ rq{y)EX + {l-r)py'\y)E,4X\Y] ^ a,q{y) + a2(y) 

' rq{y) + {l-T)py'\y) ' a,q{y) + a,{y) ^'^ 

Hence we have to minimize 

Pw — / — r~r /^^y) 

J azq{y)+ai[y) 

in A^o = {(? G Li(/i) I g > 0, J qdfi — 1}. To this end, we note that F is convex 
on the non-void, convex cone M = {g e Li{^) | q > 0} so, for some p > 0, we 
may consider the Lagrangian 

Lp{q):^F{q)+~p [ qdtJL (6.4) 



for some positive Lagrange multiplier p. Pointwise minimization in y of Lp(q) 
gives 

q^(y)^l^(\D{y)\/s ~l)^py{y) 

for some constant s = s{p) — (|EXp + p/rY^'^, Pointwise in y, q^ is anti- 
tone and continuous in s > and lim5_j.Q[oo] Qsiv) — oo[0], hence by monotone 
convergence. 

His) = / qs{y)p.{dy) 



too, is antitone and continuous and lim5_j.o[oo] H{3) = oo[0]. So by continuity, 
there is some p e (0, oo) with H{p) — 1. On Mq, / qdp^ 1, but qp = qs=p G Mq 
and is optimal on M D Mg hence it also minimizes F on Mq. In particular, we 
get representation (3.17) and note that, independently from the choice of /i, 
the least favorable P^ is dominated according to Pq ^ P^ , i.e.; non- 
dominated py^' are even easier to deal with. 
As next step we show that 

maX(3iY miny [...]= min/ maxa;^[. . .] (6.5) 

To this end we first verify (3.16) determining fa{y) as /o(j/) — E„,p[X|y"' — y]. 
Writing a sub/supcrscript "re; P" for evaluation under the situation generated 
by P = P^ and P for P^ , we obtain the the risk for general P as 

MSE„^p[/o(y"'^)] = (l-r)E.,|X-/o(r^)f + rtrCovX-|- 

+r Epmin(|P)(r'"^''')P,p2) (g g) 
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This is maximal for any P that is concentrated on the set { |_D(y'''"'')| > p}, 
which is true for P. Hence (6.5) follows, as for any contaminating P 

MSE„^p[/o(r"^n < MSE,.^p[/o(r-")] 

Finally, we pass over from dU to U: Let /r, Pr denote the components of the 
saddle-point for dU{r), as well as p{r) the corresponding Lagrange multiplier 
and Wr the corresponding weight, i.e., Wr = Wr{y) = min(l,/9(r) / |D(j/)|). Let 
i?(/, P, r) be the MSE of procedure / at the SO model dU{r) with contaminating 
= P. As can be seen from (3.17), p(r) is antitone in r; in particular, as Pr is 
concentrated on {|Z3(y)| > p(r)} which for r < s is a subset of {|_D(y)| > p(s)}, 
we obtain 

R{fs,Ps,s)^R{fs,Pr,s) forr<s 

Note that R{fs,P, 0) = R{fs, Q, 0) for ah P, Q— hence passage to R{fs,P, r) = 
Rifs,P, r) - Rif,, P, 0) is helpful— and that 



trCovX = E„ 



tiGoY,^[X\Y'^] + \D{Y'^)[' 



(6.7) 



Abbreviate Ws{Y'"') = 1 - (l - WsiY'"")) > to see that 



i?(/3,P,r) = r{E., [\D{Y'^)\^Ws{Y 



E p min 



in{\D{Y-'^)lp{s)f] 



< 



<r{E., [\D{Y'')\^WsiY^')] + p(s)^ } = i?(/., P., r) < i?(/„ A, s) 

Hence the saddle-point extends to U{r); in particular the maximal risk is never 
attained in the interior U{r) \ dU{r). (3.19) follows by plugging in the results. 
(2) Let f{Y) = f{Y) - EX, and X° = X -F.X; then (3.15) becomes 



E,, - /(y)p = min . ! s.t. sup^^ | E„ f{Y"^)\ < b 



(6.8) 



The assertion follows upon noting that supj^ | E^^ f \ — sup |/| (to be shown just 
as in Ricder (1994, chap. 5)) and writing 



E,,|x"-/(y)p = E., E[|x"-/(y)p Y] 



— minimize the inner expectation subject to [/(y'")! < b pointwise in Y. 

(3) If Eid[X|y] is linear in Y, the corresponding optimal matrix M° is just the 
respective Fourier coefficient, i.e.; Cov(X, y) Var y^. We have already recalled 
that the classical Kalman filter is optimal among all linear filters; hence the 
corresponding Kalman gain is then the optimal linear transformation in the 
SSM context. □ 

Remark 6.1. (a) Birmiwal and Shen (1993) proceed similarly for their result. How- 
ever, they invoke a minimax result by Ferguson (1967) which in our infinite dimensional 
setting is not applicable. Also their setting is restricted to one dimension, and they 
assume Lebesgue densities right away — also in the contaminated situation. In par- 
ticular, they do not realize the connection to the exact conditional mean present in 
equation (3.18). 
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(b) For an alternative proof, see Ruckdeschel (2001, pp. 156-163): It uses Rieder (1994, 
App. B), showing existence of Lagrange multipliers in (1) by abstract compactness and 
continuity arguments. 

(c) The fact that the solutions to Problems (3.14) and (3.15) coincide parallels the 
situation in the estimation problem for a one-dimensional location parameter. 

Proof to Proposition 3.4 Recall that by the Cramer-Levy Theorem (cf. 
Feller (1971, Thm. 1, p. 525)) the sum of two independent random variables 
has Gaussian distribution iff each summand is Gaussian. This can easily be 
translated into a corresponding asymptotic statement, cf. Ruckdeschel (2001, 
Prop. A. 2. 4), i.e.; the sum of two independent random variables converges 
weakly to a Gaussian distribution iff each summand converges weakly to a Gaus- 
sian distribution. We first consider (for fixed t, omitted from notation where 
clear) the filter error, 

KX:^Xt- = AX - i7,(M°Ay) 

where we assume AX, e, and v normal. Then for the conditional law of AX given 
AY is A/'p(5, (Ip - M°Z)Y) for E = Gov AX and g Af^/\Y - HbiM'^AY) = 
(|MOAr| - b)^. Hence 

£{AX) = £{g) * AAp(0, {Ip ~ M°Z)E) 

which by Gramer-Levy cannot be normal, as g is obviously not normal. Conse- 
quently AXt_|_i — FtJ^iAXt + vt+i cannot be normal either. Hence starting with 
normal AXt and St, AXt+i cannot be normal. The same assertion clearly holds 
if vt is not normal. As by (3.21), gt does neither converge to nor to M^AF, 
the asymptotic version of Cramer-Levy also excludes asymptotic normality. □ 



Remark 6.2. A similar assertion for the case that Vt is normal but not both AXt 
and St are, seems plausible and we conjecture that this is true; it may also be proven 
in particular cases, but in general, it is hard to obtain due to the lack of independence 
of AX - 3 and AY. 

Proof to Proposition 3.6 For the second equivalence in Proposition 3.6 we 
use the following lemma and a corollary of it; 

Lemma 6.3. Let e ^ JVq{0,V), X ^ and for some measurable function 
h: range(X) W let Y = h{x) + e. Let g e L[Ip^), i.e., g: range(X) -J> 
measurable and Epx |.g(X)| < oo. Then 

^E[giX)\Y ^ y] ^ Cov[g{x),h{x)\Y ^ y]V-' (6.9) 
dy 

Proof. For simplicity, we only consider rkV = q; otherwise we may pass to 
£ = Ae for some e ~ J^q{0, V) with rkV = q and use the generalized inverse V~ 
instead of everywhere in the proof. 
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Let p'^ be the Lebesgue density of e and denote A^(e) :— logp^(e). Then, 
no matter whether e is Gaussian, it holds that 

.^^|^ 1 Jgix)p%y-h{x))P''idx) 
^[^(^^"^ = y^= jp^{y-h{x))PHdx) 

As e is normal, we may interchange differentiation and integration and obtain 
that 

^ n9{X)\Y ^y]^ Cov[g(X), A^(y - h{X)) \Y = y] 
But as £ A/'q(0, V), it holds that A''(£) = -V^^e, which entails (6.9) as 

A-(y - h{x)) - E[A-(y - - - - E[/i(X)|y = y]) 

□ 

Corollary 6.4. /n our linear time discrete, Euclidean SSM, ommiting indices 
t, assume that rkV ^ q and let 

U:=V^^ZAX, t/° C/-E[J7|Ay], AX*^ -.^ AX ~ E[AX\AY] (6.10) 

Then 

d 

— mAX\AY = y] = Coy(AX,U\AY = y) (6.11) 
dy 

^^E[AX,|Ar=.y] = E(AXOC/Ot/0|Ar = 2/) (6.12) 

Proof. During the proof we will omit A in notation. Equation (6.11) is just 
plugging in Lemma 6.3. We note that equivalently to (6.9) we could have written 

|- E[x|r = y] = E[x([/°)nr = y] = nxu^\Y = y]-E[x|y = y] E[C/|r = yY 

dy 

Hence applying Lemma 6.3 for g{X) — XiUj and g{X) = Uj to the last two 
terms we obtain 

nX,\Y = y] = E[X,U,U°,\Y = y]-E[X,\Y = y]'E[U,U°\Y = y] = 



E[Xf = ?y] = EiXOC/j-C/riF = y] 

□ 

Proof to Proposition 5. 6 Equivalence (3.23): 

If £(AAr) is normal, the uncorrelated random variables IIAX and IIAX are 
independent and again normal, while the random variables AA", Ay are jointly 
normal, hence linearity of conditional expectation is a well-known fact. 
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If Eij[AX|Ay] is linear, after subtracting EMZAX from both sides, the 
defining equation for the conditional expectation {dy)-a.e. reads 

M [ [y- Zx)p^y - Zx) {dx) = (Ip - MZ) [ xp' {y - Zx) P^ [dx) (6.13) 



Let us introduce (?^(y) = yp^{y) and the signed measure Q-^ {dx) = xP{dx)\ if 
we denote the mapping /i : M'^ — >• K, y h{y) — J f{y — Zx) G{dx) by /*2 G, 
(6.13) becomes 

Mq' *^ P^ = {Ip - MZ)p^ *^ (6.14) 
We pass over to the Fourier transforms (denoted with ") for s S M^, i € 

q^{s) = Je'^^'^Q^idx), p^{s) = Je'^^'^ P^{dx), 
q%t) = / e^'^-q'iy) dy, p%t) = / e**^ V(y) dy, 

As usual, convolution translates into products in Fourier space, in our case 

f77G{t) = f{t)G{Z-t), t e M« 

and hence (6.14) in Fourier space is Mq^p^iZ^ ■) = (Ip- MZ)p^q^{Z^ ■ ). For 
the derivatives (p^)'(s), (p'^)'(t) for s e W and t e M"?, we obtain 

(p^)'(.s)=»g^(s), {fY{t)^ir{t) (6.15) 

By assumption, Ip — MZ is invertible and e ^ JVq{0,V), hence p^{t) = 
exp{—t'^Vt/2) > and together with (6.15), this gives the linear differential 
equation 

{p^)'{Z^t) = -(Ip - MZ)-^MVtp^{ZH) (6.16) 
Fixing any direction such that Z'^t^ ^ 0, this becomes an ODE 

g'is) = -tlZ{lp - MZy'MVtosgis), g(0) = 1 

which has a unique solution given by 

g{s) = exp(-i5Z(Ip - MZ)-^MVtQS^/2) 

This is the characteristic function of a normal distribution, so ZAX, hence also 
IlAX are normal, and together with (3.25) the assertion follows. On the other 
hand, CovZA^^ = ZEZ^, so we have also shown that 2'(Ip - MZ)-^MV = 
ZTiZ'^ , which otherwise is tricky unless assuming E and A invertible. 

Equivalence (3.24): 

If Eij[AX|Ay] is linear, by equivalence (3.23) AX and AY are jointly normal 
with expectation 0, so the conditional law of AX given AY is again normal 
with expectation 0, hence in particular symmetric so the assertion follows. 
Now assume 

E (e^(AX-E[AX|Ar]))^ AY =Q VeeM^ (6.17) 
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Apparently, E,<j[AX|Ay] is linear iff d"^ / dydy'"&,^[/^X\/^Y\ = 0. But Corol- 
lary 6.4 gives (in the notation of (6.10)) 

E[AX,|Ar = y]= E(AX0C/0(7°|Ar = y) (6.18) 

By complete polarization (compare Wcyl (1997, Chap. I.l)), (6.17) also entails 
that the symmetric multilinear form given by E[AX°AX°AX^|y = j,feg{i,....p} 
is identically 0. So the assertion follows, as with Z = ZV~^, the RHS of (6.18) 
is just 

^2,^1 ZjMZkj E(AX?AXOAXO|Ay = y) □ 

Proof to Theorem 3.10 We proceed as in Theorem 3.2, but note that in the 
eSO context (6.2) becomes 

P(XeA,y"eB) = [i - r) j lA{x)lB{yMy,x) P'^'idx) tx{dy) 

+r [ lAix)lB{y)q{y)P''"{dx)^i{dy) 



and hence (6.3) becomes 

rq{y) E,>[X^'] + (1 - r)p'''\y)E,,[X\Y] 



E„[X|r" = y] = 



rg(y) + (1 - r)p^' (y) 



But by (3.31), the RHS of (6.19) is exactly F{q) from (6.3). Thus, we may jump 
to the proof of Theorem 3.2 from this point on, replacing trCovX by 

G := tr Covp^d. X"' = G - \ E^, X'^P 

in equation (6.6). For passing from dU°^° to , let /r, Pr ® Qr be the com- 
ponents of the saddle-point at dW''{r) and i?(/, P ® Q,r) be the MSE of 
procedure / at dW^°{r) with contaminating P^ (g) P^ = P ® Q. Instead of 
equation (6.7), we use 

AG := G- trCov.dX"* = G - E.^ \X''^\'^ > 

and abbreviating i?(/, P ®Q,r) ~ R{f, P ®Q,Q) by R{f, P ®Q,r) we obtain 

i?(/s,P®Q,r) -r{trCovQX^'-Cov.,X'" + Ep[min(|D(r^')l,P(s))'] } < 
< r{AG + E,, [\D{T')\^w,{Y^')\ + p{sf } = 

= R{fs,Pr®Qr.r) < R{f,,Pr®Qr,s)= R{fs,Ps®Qs,s) 

Hence the saddle-point extends to U°^°{r). (3.33) follows by plugging in the 
results. □ 
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Proof to Proposition 3.8 Under Hq, due to Proposition 3.6, '~' 
AAp(0,S). Hence e^AX]' 7V(0, cr^). Tlius by tlie Lindeberg-Levy CLT, 

1 " 

But the sixtii moment of A/'(0, cr^) is just 15cr^. Hence by the assumed consistency 
of e„ for e, Slutsky's Lemma yields (3.26). Asymptotically, the testing problem 
is a test for a normal mean /x to be or not, which yields the corresponding 
optimality for the Gauss test given in (3.27). □ 

Proof to Proposition 3.11 Let us identify X AX^, Y ^ AF-^ := 
ZAX-^ + e, and set = 7V;(0, V), = J\fp{0, S), and let the correspond- 
ing Lebesgue density, then ■7T{y,x) — p'^{y — Zx). Assertions (!') and (3') of 
Theorem 3.10 show that the eSO-optimal /o in our "Bayesian" model of subsec- 
tion 3.2 is just /o(y) = M'^(i;) min{l, with p according to (3.17) such 
that / dPg^*" = 1 and = EZ^(ZEZ^ V)-^. 

By assumption, AX'^^ lies in the corresponding eSO-neighborhood U{r) about 
/S.X^ so the value of the saddle-point from equation (3.19) is also a bound for 
the MSE of Xl]^^ on U{r). □ 

Remark 6.5. One should mention, however, that due to assumption (2.12) rasp. 
(3.11), members of an SO-neighborhood U'{r') about C{l^X''^^ , /^V^^) need not lie 
in an eSO neighborhood U{r + r') about £(AX^, AY^). 
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